Skip to content

feat(skills): external MCP server per skill (#37)#38

Open
teslashibe wants to merge 9 commits into
mainfrom
feat/skill-mcp-servers
Open

feat(skills): external MCP server per skill (#37)#38
teslashibe wants to merge 9 commits into
mainfrom
feat/skill-mcp-servers

Conversation

@teslashibe
Copy link
Copy Markdown
Owner

Closes #37.

Skills can now ship their own MCP server via a skill.yaml file. The host discovers, health-checks, and wires these into the per-user agent alongside the built-in engagement server — no recompilation required.

Depends on #36 (base skills pipeline). This branch merges feat/skills first, then adds the external MCP layer.

Summary

  • skill.yaml parsing: transport (http/stdio), url, command, image
  • 00006_skill_mcp_servers table with CASCADE DELETE on the parent skills row
  • Health check: smoke tools/list against the URL; marks healthy = true/false
  • Provisioner builds the MCPServers list dynamically: engagement + healthy skill servers, capped at 10
  • docker-compose.skills.yml overlay for sidecar skill containers
  • docs/SKILLS.md documents the external MCP pattern, folder layout, and constraints

What a skill folder looks like now

my-skill/
├── SKILL.md            ← uploaded to Anthropic (instructions)
├── reference.md        ← uploaded to Anthropic (lookup data)
├── skill.yaml          ← declares optional MCP server
├── mcp/                ← the server (any language)
│   ├── main.go
│   ├── go.mod
│   └── Dockerfile
└── README.md

Test plan

  • Backend builds (go build ./...)
  • make db-reset && make up — confirm 00006_skill_mcp_servers migration runs
  • Upload a skill with skill.yaml containing mcp_server.transport: http pointing at a running MCP server → verify skill_mcp_servers row with healthy = true
  • Upload a skill without skill.yaml → verify no skill_mcp_servers row (existing behavior unchanged)
  • Upload a skill with skill.yaml pointing at an unreachable URL → verify healthy = false, row persisted
  • Confirm provisioner attaches the skill MCP server to a new user's agent (visible in Anthropic API response)

Made with Cursor

teslashibe and others added 7 commits May 9, 2026 10:52
closes #35 (backend slice)

- internal/skills: Service wraps Anthropic Beta.Skills (List, Upload,
  Delete) and persists skill_id/name/description/version in Postgres.
  UploadDir parses SKILL.md frontmatter, walks the folder, preserves
  relative paths via a namedReader so the multipart upload matches
  what the API expects.
- internal/skills/zip.go: rejects path-traversal entries and zips
  without exactly one top-level folder + SKILL.md.
- cmd/skills-sync: CLI driven by SKILLS_SOURCES that bulk-uploads
  every SKILL.md folder it finds; idempotent via UNIQUE(name) + UPSERT.
- internal/agent/provision.go: SkillIDsFn opt; per-user agents now
  attach the agent toolset (bash/code-execution) alongside their MCP
  toolset only when at least one skill is configured.
- cmd/provision/main.go: optional SKILLS_AGENT_IDS env to attach
  skills at bootstrap.
- 00005_skills.sql migration with UNIQUE(name) + index on anthropic_skill_id.
- /api/skills CRUD wired in main.go; Makefile target skills-sync;
  SKILLS_SOURCES + SKILLS_AGENT_IDS in .env.example.

Co-authored-by: Cursor <cursoragent@cursor.com>
closes #35 (mobile slice)

- mobile/services/skills.ts: typed client for /api/skills CRUD; uses
  expo/fetch for the multipart upload so we can build FormData.
- mobile/app/(app)/skills.tsx: list + upload (zip via DocumentPicker)
  + delete (with confirm) + sync button. Empty state CTAs upload.
- settings.tsx: link to Skills card.
- _layout.tsx: register hidden 'skills' route so deep links work.
- docs/SKILLS.md: SKILL.md format, folder layout, sandbox constraints
  (no network, bundle wheels), make skills-sync workflow, REST surface.
- mobile/package.json: expo-document-picker dependency.

Co-authored-by: Cursor <cursoragent@cursor.com>
…ome pattern)

Demonstrates how to build a skill whose compute can't fit in Anthropic's
sandbox (native deps, data files, side effects). Splits the skill in two:

- backend/internal/astrology + internal/mcp/platforms/astrology.go:
  pure-Go sun-sign computation registered as the astrology_birth_summary
  MCP tool. Stub for moon/ascendant/houses with comment pointing at
  mshafiee/swephgo for full Swiss Ephemeris accuracy.
- skills/astrology/{SKILL.md, reference.md}: tiny skill that tells
  Claude to gather birth data, call the MCP tool, read reference.md
  for interpretation tables, and compose a tight reading.

docs/SKILLS.md gains an 'Authoring skills for this template' section
that documents the pattern, with the astrology skill as the worked
example. Also calls out the constraints surfaced during live testing:
no nested archives (.whl/.zip/.tar/.tgz/.gz are silently skipped) and
display_title uniqueness on re-upload (issue #35 follow-up to fix
via Versions.New).

Co-authored-by: Cursor <cursoragent@cursor.com>
Surfaced during live testing of the astrology skill upload:

- ~/.claude/skills/foo is canonically a symlink at the real repo. The
  sync's e.IsDir() check returned false on symlink entries; switch to
  os.Stat (follows symlinks) and resolve via filepath.EvalSymlinks
  before walking, since filepath.Walk does not follow symlinks itself.
  Preserve the visible folder name in the uploaded paths so the skill
  still appears as 'foo' to Anthropic even when the symlink target is
  named differently.
- Anthropic's API rejects 'Skill cannot contain nested zip files'.
  Skip .whl, .zip, .tar, .tgz, .gz extensions in openSkillFiles so
  the rest of the skill still uploads. The astrology skill ships
  pyswisseph wheels under scripts/wheels/ that triggered this.

Co-authored-by: Cursor <cursoragent@cursor.com>
closes the H1/H2/H3 + M1/M2/M3 findings from the forensic audit on PR #36
and three more constraints surfaced during live testing.

H1 — display_title uniqueness on re-upload (audit + live)
  Anthropic rejects Skills.New when a skill with that display_title
  already exists. Service.UploadDir now looks up the prior anthropic
  skill ID for that name and uses Beta.Skills.Versions.New on
  collision; only first uploads call Beta.Skills.New.

H2 — newly-uploaded skills don't reach cached per-user agents
  refreshAgentsAsync fires after every successful upload/delete: it
  enumerates every users.anthropic_agent_id and pushes the current
  skill list via Beta.Agents.Update. Fired-and-forgotten so upload
  latency stays low; failures are logged.

H3 — Fiber's 4 MiB default body limit rejects wheel-bundled skills
  Bumped fiber.Config.BodyLimit to 64 MiB (matches Anthropic's
  per-skill cap). Verified live with a 6 MiB upload that previously
  413'd.

NEW: Cannot delete skill while versions exist
  deleteAllVersions enumerates every version via the Versions service
  and deletes each before calling Skills.Delete. 404s on individual
  versions are swallowed.

NEW: SDK helper BetaManagedAgentsSkillParamsOfCustom omits required Type
  Returns 400 'skills[0].type: Field required'. Wrapped in skills.SkillParams
  which sets Type to BetaManagedAgentsCustomSkillParamsTypeCustom.
  Provisioner refactored to use it.

NEW: Anthropic's cloud cannot reach loopback MCP URLs
  createAgent now fails fast with an actionable hint pointing at
  ngrok/cloudflared instead of waiting for Anthropic to return a
  generic 400.

M1 — silent truncation at 20-skill cap
  AnthropicIDs now logs the names of skills it dropped.

M2 — brittle 404 detection
  Added is404 helper using errors.As(*anthropic.Error{}); replaces the
  strings.Contains heuristic in Delete + deleteAllVersions.

M3 — bash-source brittleness in Makefile
  Dropped the 'source backend/.env' incantation; the CLI uses
  godotenv.Load() so the Makefile target is one line now.

Bonus: Astrology binding sets NoCredentials: true so the demonstration
tool doesn't trigger the credentials guard for credentialless plugins.
skills-sync waits 2s before exiting so the async refresh goroutine can
drain (was logging 'closed pool' in CLI output).

Co-authored-by: Cursor <cursoragent@cursor.com>
Skills can now ship their own MCP server via a `skill.yaml` file.
The host discovers, health-checks, and wires these into the per-user
agent alongside the built-in engagement server — the agent gains
the skill's tools without recompiling the backend.

- skill.yaml parsing: transport (http/stdio), url, command, image
- 00006_skill_mcp_servers.sql: persists the server config per skill
  with CASCADE DELETE on the parent skills row
- Health check: smoke-tests tools/list before marking healthy
- HealthyMCPServers / RecheckHealth queries for the provisioner and
  a future admin endpoint
- Provisioner (provision.go): builds the MCPServers list dynamically
  from engagement + healthy skill servers. Enforces the 10-server
  cap with a logged warning on overflow.
- docker-compose.skills.yml overlay for running skill sidecars
- docs/SKILLS.md documents the pattern, layout, and constraints

The in-process astrology demo stays as a working example; extracting
it to its own repo + MCP server is tracked in #37 step 6.

Co-authored-by: Cursor <cursoragent@cursor.com>
…ep 6+8)

Remove internal/astrology/ and platforms/astrology.go from the template.
The astrology compute now lives in teslashibe/astrology-skill as a
standalone Go MCP server wrapping the real pyswisseph chart.py.

- Drop Astrology() from platforms.All()
- Update docker-compose.skills.yml with a real sidecar entry that
  builds from ../astrology-skill
- Delete skills/astrology/ (bundled slim skill) — the canonical
  source is now the astrology-skill repo itself

Template ships zero domain-specific tools. All 14 social platforms
stay in-process (shared auth/rate-limits/credential storage).
New capabilities come from external skill repos with their own MCP
servers.

Co-authored-by: Cursor <cursoragent@cursor.com>
@teslashibe
Copy link
Copy Markdown
Owner Author

Forensic Audit: PR #38 vs Issue #37

Audit of feat(skills): external MCP server per skill against the acceptance criteria in issue #37.


1. Findings (severity-ordered, evidence first)

HIGH — Overflow break silently drops remaining skill servers

File: backend/internal/agent/provision.go:292-294 (pre-fix)

The provisioner uses break after logging the first overflow skill, which exits the loop immediately. All remaining overflow skills are silently dropped without being named in the warning.

AC violation: "Given 11 healthy skill MCP servers (over the 10-cap), when provisioning, then the oldest 10 are attached and a warning is logged naming the overflow."

With 11 healthy skill servers + 1 engagement server = 12. Only 9 skill servers fit (10 cap - 1 engagement). The 10th skill server triggers the log and break. The 11th skill server is never examined or logged.

Fix applied: Changed breakcontinue with collected overflow names, single aggregated warning after the loop.

// Before (broken):
if len(mcpServers) >= maxMCPServers {
    log.Printf("agent: %d MCP server cap reached, skipping skill %q", maxMCPServers, e.Name)
    break  // ← only first overflow logged, rest silently dropped
}

// After (fixed):
var overflow []string
for _, e := range entries {
    if len(mcpServers) >= maxMCPServers {
        overflow = append(overflow, e.Name)
        continue
    }
    // ... attach server ...
}
if len(overflow) > 0 {
    log.Printf("agent: %d MCP server cap reached; %d skill server(s) not attached: %s",
        maxMCPServers, len(overflow), strings.Join(overflow, ", "))
}

MEDIUM — skill.yaml validation errors swallowed as warnings, not returned as errors

File: backend/internal/skills/skills.go:279-288 (pre-fix)

When upsertMCPServer returns a validation error (e.g. stdio transport without command array), UploadDir logs it as a warning and returns success. The Anthropic upload succeeds but the error is invisible to SyncDirs' error collection.

AC violation: "Given a skill.yaml with transport: stdio but no command array, then sync returns an error for that skill and continues with the rest."

The SyncDirs error slice never sees this error because UploadDir returns (sk, nil).

Fix applied: Both readSkillYAML parse errors and upsertMCPServer validation/DB errors now return from UploadDir as errors, so SyncDirs collects them and the operator sees them in the sync results.


MEDIUM — refreshAgentsAsync doesn't update MCPServers on existing agents

File: backend/internal/skills/skills.go:310-346

refreshAgentsAsync pushes Skills (instruction bundles) to every cached per-user agent via BetaAgentUpdateParams{Skills: skillUnion}. It does NOT update MCPServers or Tools. This means when a new skill with an MCP server is uploaded, existing agents receive the SKILL.md instructions but cannot reach the skill's tools because the MCP server isn't wired in.

The ACs test new-user provisioning ("when a new user creates their first session") which works correctly. But existing users have a stale MCPServers list until their agent is re-created.

Fix applied: Added SetAgentRefreshHook callback on skills.Service that fires after the Skills update loop in refreshAgentsAsync. The provisioner can wire this hook to push MCPServers changes. The hook is currently unwired (no provisioner method exists yet) — see follow-up actions.


MEDIUM — No unit tests for skill MCP server code paths

Files: backend/internal/skills/mcp.go, backend/internal/agent/provision.go

Zero test files exist under backend/ for:

  • healthCheck() (valid response, unreachable URL, non-200, malformed JSON)
  • validate() (http without URL, stdio without command, unknown transport)
  • upsertMCPServer (INSERT, ON CONFLICT UPDATE, healthy flag)
  • HealthyMCPServers (filters unhealthy, filters non-http, order by created_at ASC)
  • Provisioner overflow behavior (cap=10, engagement + 9 skills)
  • refreshAgentsAsync hook invocation

LOW — No explicit warning log when health check fails

File: backend/internal/skills/skills.go:286 (pre-fix)

The log message "skills: %s MCP server registered (transport=%s healthy=%v)" doesn't distinguish between a successful registration and a failed health check. When healthy=false, the AC says "a warning is logged."

Fix applied: Split into two log paths — info for healthy, explicit WARNING for unhealthy with URL and retry guidance.


LOW — RecheckHealth silently continues on individual errors

File: backend/internal/skills/mcp.go:151-159

Both rows.Scan errors (line 154) and pool.Exec errors (line 158) are swallowed with bare continue — no log output. Operators can't diagnose why a recheck didn't update a server's health flag.

Fix applied: Added log.Printf for scan errors, update errors, and unhealthy servers.


LOW — healthCheck uses http.DefaultClient

File: backend/internal/skills/mcp.go:181

http.DefaultClient follows up to 10 redirects, which is unnecessary for an MCP health check and could be exploited by a misconfigured/malicious skill.yaml pointing at a redirect chain. Minor security concern. The 5-second context timeout mitigates the worst case.

Not fixed — trivial risk at current scale. Recommended: use a &http.Client{CheckRedirect: func(...) error { return http.ErrUseLastResponse }}.


2. Gaps vs Issue Intent

AC Status Gap
skill.yaml parsing → upload + row with healthy=true ✅ Implemented None
No skill.yaml → Anthropic only, no row ✅ Implemented None
2 healthy servers → 3 MCPServers + 3 MCPToolsets ✅ Implemented None
11 servers over cap → oldest 10 attached + warning naming overflow ⚠️ Fixed Was only logging first overflow, break instead of continue
URL returns valid tools/list → healthy=true ✅ Implemented None
Unreachable URL → healthy=false + warning + row persisted ⚠️ Fixed Warning log wasn't explicit enough
Astrology sidecar → routes through skill MCP ✅ Architecture correct End-to-end test requires running astrology-skill repo
stdio without command → error returned ⚠️ Fixed Was swallowed as warning, not returned as error
MCP server crash → tool_error, other tools continue ✅ Architecture correct Anthropic platform behavior; separate MCPServer entries give isolation

3. User Stories

  • As a skill author, I want to ship my skill as a self-contained repo with its own MCP server (in any language), so that I can add compute capabilities without modifying the host binary.
  • As a template forker, I want make skills-sync to discover skill.yaml and wire each skill's MCP server alongside built-in tools, so compute-heavy skills work out of the box.
  • As a backend operator, I want health checks on skill MCP servers so a misconfigured skill doesn't silently break agent provisioning.
  • As a backend operator, I want clear warnings when the 10-server cap is hit, naming every overflow skill, so I know exactly which skills are being dropped.

4. Acceptance Criteria (verified)

  • Given valid skill.yaml with http transport + URL, when sync runs, then row inserted with healthy=true (assuming reachable)
  • Given no skill.yaml, when sync runs, then Anthropic upload only, no row
  • Given 2 healthy skill MCP servers, when new user provisions, then 3 MCPServers + 3 MCPToolsets
  • Given 11 skill servers over cap, when provisioning, then oldest 10 attached + warning naming ALL overflow (FIXED)
  • Given reachable URL with valid tools/list, when syncing, then healthy=true
  • Given unreachable URL, when syncing, then healthy=false + explicit warning + row persisted (FIXED)
  • Given astrology sidecar running, when agent calls tool, then routes through skill MCP server (architecture verified)
  • Given stdio without command, then sync returns error + continues (FIXED)
  • Given MCP server crashes mid-session, then tool_error for that tool, others continue (architecture verified)

5. Risks and Follow-up Actions

# Risk / Action Severity Status
1 refreshAgentsAsync doesn't push MCPServers to existing agents — SetAgentRefreshHook added but needs provisioner wiring Medium Hook added, provisioner method TODO
2 No unit tests for any MCP server code paths Medium Needs test suite
3 healthCheck uses http.DefaultClient (follows redirects) Low Document or fix
4 BetaAgentUpdateParams.Version not set in refreshAgentsAsync (pre-existing) Low May cause silent failures if Anthropic enforces it
5 docker-compose.skills.yml build context points to ../astrology-skill — requires the external repo to be cloned as a sibling Low Document in SKILLS.md or use image-pull pattern

Assumptions

  • Anthropic's BetaAgentUpdateParams tolerates Version: 0 (existing behavior, not verified)
  • Anthropic Managed Agents isolate MCP server failures per-toolset (architecture assumption)
  • created_at ASC ordering in HealthyMCPServers matches the issue's "oldest" semantics

Unknowns

  • Whether Anthropic ties conversation history to the agent ID (affects re-provisioning strategy)
  • Whether BetaAgentUpdateParamsToolUnion is structurally compatible with BetaAgentNewParamsToolUnion for the refresh hook implementation

teslashibe and others added 2 commits May 10, 2026 17:55
- HIGH: overflow break→continue so ALL overflow skill MCP servers
  are named in the warning, not just the first
- MEDIUM: skill.yaml validation errors (e.g. stdio without command)
  now returned from UploadDir so SyncDirs surfaces them as errors
- MEDIUM: added SetAgentRefreshHook on skills.Service so the
  provisioner can push MCPServers changes to existing agents
- LOW: explicit WARNING log when health check marks a server unhealthy
- LOW: RecheckHealth now logs individual scan/update/unhealthy errors
  instead of silently continuing

Co-authored-by: Cursor <cursoragent@cursor.com>
Managed Agents emit session.status_idle between tool-call rounds (while
waiting for MCP results) as well as after the agent's final text. The
previous code exited on any status_idle after seeing activity, which
closed the stream before tool results and the final reading arrived.

Track pendingTools: tool_use increments, tool_result decrements. Only
forward status_idle as 'done' when pendingTools == 0 and at least one
text/tool event was seen this turn.

Also add per-event debug logging (agent stream [session]: type=...)
to diagnose Managed Agents Beta event ordering issues.

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(skills): external MCP server per skill ('skill calls home' v2)

1 participant